AITopics | policy teaching

Collaborating Authors

policy teaching

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Envy-free Policy Teaching to Multiple Agents

Neural Information Processing SystemsAug-14-2025, 22:54:06 GMT

We study envy-free policy teaching.

adjustment scheme, agent, target policy, (15 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Policy Teaching via Data Poisoning in Learning from Human Preferences

Nika, Andi, Nöther, Jonathan, Mandal, Debmalya, Kamalaruban, Parameswaran, Singla, Adish, Radanović, Goran

arXiv.org Artificial IntelligenceMar-13-2025

We study data poisoning attacks in learning from human preferences. More specifically, we consider the problem of teaching/enforcing a target policy $\pi^\dagger$ by synthesizing preference data. We seek to understand the susceptibility of different preference-based learning paradigms to poisoned preference data by analyzing the number of samples required by the attacker to enforce $\pi^\dagger$. We first propose a general data poisoning formulation in learning from human preferences and then study it for two popular paradigms, namely: (a) reinforcement learning from human feedback (RLHF) that operates by learning a reward model using preferences; (b) direct preference optimization (DPO) that directly optimizes policy using preferences. We conduct a theoretical analysis of the effectiveness of data poisoning in a setting where the attacker is allowed to augment a pre-existing dataset and also study its special case where the attacker can synthesize the entire preference dataset from scratch. As our main results, we provide lower/upper bounds on the number of samples required to enforce $\pi^\dagger$. Finally, we discuss the implications of our results in terms of the susceptibility of these learning paradigms under such data poisoning attacks.

dpo, learning, rlhf, (16 more...)

arXiv.org Artificial Intelligence

2503.10228

Country:

Asia > Thailand (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks

Rakhsha, Amin, Radanovic, Goran, Devidze, Rati, Zhu, Xiaojin, Singla, Adish

arXiv.org Artificial IntelligenceNov-21-2020

We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes reward in infinite-horizon problem settings. The attacker can manipulate the rewards and the transition dynamics in the learning environment at training-time, and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an optimal stealthy attack for different measures of attack cost. We provide lower/upper bounds on the attack cost, and instantiate our attacks in two settings: (i) an offline setting where the agent is doing planning in the poisoned environment, and (ii) an online setting where the agent is learning a policy with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.

agent, algorithm, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2011.10824

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

Rakhsha, Amin, Radanovic, Goran, Devidze, Rati, Zhu, Xiaojin, Singla, Adish

arXiv.org Artificial IntelligenceMar-28-2020

We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an \emph{online} setting where the agent is learning a policy using a regret-minimization framework with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.

agent, optimization problem, policy teaching, (14 more...)

arXiv.org Artificial Intelligence

2003.12909

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback